Implementation of barycentric resampling for continuous wave searches in 

gravitational wave data 
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We describe an efficient implementation of a coherent statistic for continuous gravitational wave 
searches from neutron stars. The algorithm works by transforming the data taken by a gravitational 
wave detector from a moving Earth bound frame to one that sits at the Solar System barycenter. 
Many practical difficulties arise in the implementation of this algorithm, some of which have not been 
discussed previously. These difficulties include constraints of small computer memory, discreteness 
of the data, losses due to interpolation and gaps in real data. This implementation is considerably 
more efficient than previous implementations of these kinds of searches on Laser Interferometer 
Gravitational Wave (LIGO) detector data. 
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I. INTRODUCTION 

Rapidly rotating neutron stars are among the most 
promising sources of continuous gravitational waves. 
They can emit gravitational waves through a variety of 
mechanisms, including unstable oscillation modes P, [1] 
and deformations of the crust ^ [^-Q • Neutron stars can 
radiate powerful beams of radio waves from their mag- 
netic poles. If a neutron star's magnetic poles are not 
aligned with its rotational axis, the beams sweep through 
space, and if the Earth lies within the sweep of the beams, 
the star is observed as a point source in space emitting 
bursts of radio waves. Such a neutron star is called a 
pulsar 0, Si ■ Since the first discovery [§] , around 2000 
pulsars have been detected p^| - [T^ . 

Due to magnetic dipole radiation and gravitational ra- 
diation, the rotational frequencies of neutron stars slowly 
decrease in time. Other than this effect, gravitational 
waves from isolated rotating neutron stars are essentially 
monochromatic in the rest frame of the star. The waves 
are continuous and their frequency is determined by the 
rotational frequency of the star. The motion of the de- 
tector as the earth rotates about its axis and around the 
sun, however, modulates the phase as well as the ampli- 
tude of the received signal. In order to recover the sig- 
nal from interferometric data optimally, both of these ef- 
fects must be taken into account. Detecting gravitational 
waves from neutron stars could reveal information about 
the strength of neutron star crusts and the equation of 
state of the nuclear matter that makes up the star f(^. 
Continuous gravitational waves may also be p roduced by 
other sources, such as cosmic strings [H, Il4{ . 

There are a number of techniques available for contin- 
uous wave searches. These techniques can be loosely di- 
vided into two categories: (1) coherent methods [isl [T7|. 
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which keep track of the phase of the gravitational wave 
signals over long periods of time, and (2) semi-coherent 
methods [l^, which combine shorter periods of data 
without tracking the phase (for example, taking Fourier 
transforms of short segments of data and then summing 
the power). 

When the sky location and phase evolution of a neu- 
tron star are known, a coherent search for continuous 
gravitational waves is relatively straightforward • As- 
suming that the noise in a gravitational wave detector 
follows Gaussian statistics, in the presence of a signal, 
the signal to noise recovered in a search increases with 
the square root of the amount of data used in the search. 
This is because the signal amplitude grows linearly while 
the noise follows a random walk. Thus, with enough 
data, it is possible to recover any continuous signal out 
of noisy data. 

If certain parameters of the signal (sky location, fre- 
quency, spindowns and binary parameters) are not known 
the search becomes much more involved. The reason is 
that the number of points needed to cover the search 
parameter space (and ensure no signals are lost) grows 
like a large power of the amount of data used [19| . This 
makes the sensitivity of gravitational wave searches com- 
putationally bound: One cannot simply integrate arbi- 
trary amounts of data to gain sensitivity because there 
is not enough computational power available to perform 
the search. Thus, more efficient code and greater com- 
puting power are highly desirable, since they translate 
into more data being analyzed and therefore an increase 
in the sensitivity of gravitational wave searches. 

A promising method for blind searches involves ex- 
ploiting large-scale correlations in the coherent detection 
statistic [23]. Another method that has been success- 
ful in these kinds of searches is the hierarchical scheme 
of incoherently combining coherent sets of data. Some 
of the methods currently under use include the Hough 
transform and stack-slide (Tsj . 

In this paper we focus on an efficient implementation 
of coherent techniques. The method we present here is 
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similar to several previous implementations |25l - l27l |. in 
that it uses fast Fourier transforms (FFTs) to calculate 
the so-called J^-statistic the logarithm of the like- 
lihood function maximized over the intrinsic (and un- 
known) parameters of the gravitational wave produced 
by a neutron star, but with one very important differ- 
ence. We resample the time domain data to the Solar 
System barycenter before taking a FFT. This allows us 
to use a single FFT to calculate the detection statistic for 
arbitrarily many frequencies and an arbitrary amount of 
observation time, while previous implementations have a 
maximum frequency band and observation time that can 
be calculated with a single FFT, which are determined 
by losses due to phase mismatch. Another set of tech- 
niques, described in '28', 'ssi Is^] , iinplement stroboscopic 
resampling methods described in [32|. The stroboscopic 
method requires data at full bandwidth, and is therefore 
not suitable for distributed computing applications such 
as Einstein(Q)Home. 

In Section |ll] we review the signal properties and the 
nearly-optimal coherent statistic that can be used to 
extract continuous signals from interferometric gravita- 
tional wave data. In Section IIIII we discuss how to 
implement the calculation of the nearly-optimal coher- 
ent statistic in a computationally efficient way in both 
the time and frequency domains. In Section IIVI we de- 
scribe the results of a computer code using this algorithm 
on software injections of gravitational waves into gaus- 
sian noise. Lastly, in Section |V] we address important 
technical issues to do with practical implementations of 
barycentric resampling. 



is 90°). The functions a{t) and b{t) both depend on time 
and location of source and detector, but are independent 
of the polarization angle ip. 

In the detector frame the phase of a gravitational wave 
produced by an isolated neutron star can be written 




fe=o y ^ k=o 

(4) 

where $0 is the phase at the start time of the observation, 
/q'^'* is the fc"^ derivative of the frequency, c is the speed 
of light, a and S are the right ascension and declination of 
the source, no — no (a, (5) is the unit vector of the source 
in the Solar System barycenter (SSB) reference frame, 
is the position vector of the detector in the same frame, 
and s is the order of the expansion. Neglecting changes 
in the proper motion of the star, the third term in Eq. ^ 
is a correction to the phase due to the detector motion 
relative to the neutron star. 

We can define ^{t) = vl'(i) — <I>o(i), as well as defining 

s -f/c+1 o ^ 4-k 

it) = 2.J: ' + ^no . r,it) J: L (5) 
fe=i ^ '' fe=i 

and 

no ■ Vd{t) , , 

Ua = . (6) 

c 

Equations (O and (|6]) let us write 



II. PRELIMINARIES 

In this section we closely follow the method of Jara- 
nowski, Krolak, and Schutz [l5j to provide the back- 
ground on the signal and the detection statistic. Power- 
recycled Fabri-Perot Michelson interferometers such as 
those used by the Laser Interferometer Gravitational 
Wave Observatory (LIGO) are sensitive to the strain 
caused by gravitational waves passing through it. The 
strain measured at a detector can be written as 11511 



h{t) ^ F+{t)h+{t) + F^{t)h^{t), 



(1) 



where t is the time in the detector frame, and /i+ and 
are the "plus" and "cross" polarizations of gravitational 
wave. F^(t) and Fx{t) are the beam-pattern functions 
of the interferometer and are given by 



and 



F+{t) ^ smC[a{t) cos2tp + b{t) sm2-ip], (2) 



Fx (t) = sin C[b{t) cos 2ip - a{t) sin 2ip] , (3) 



where tp is the polarization angle of the wave and C is the 
angle between detector arms (which in the case of LIGO 



<Pit)= 27: f[t + Un{t;a,d)]+<i>sit-J^''\a J), (7) 

which has the modulation due to the detector's motion 
around the SSB clearly separated from the modulation 
due to the gravitational wave's intrinsic frequency, al- 
though not the derivatives of the frequency. 

An almost optimal statistic for the detection of con- 
tinuous gravitational wave signals is called the 
statistic [TBI . [iGj . It is the logarithm of the likelihood 
function maximized over the intrinsic and unknown sig- 
nal parameters. The J^-statistic is given by 



J" 



4 B\Fa\^' + A\Fb\^-2CRiFaF*) 



Sh{f)To 



D 



(8) 



where Sh{f ) is the one-sided spectral density of the de- 
tector's noise at frequency / and Tq is the observation 
time. A, B, C, and D are given by 

A = (a||a); B = {b\\b); C = {a\\b)\D ^ A ■ B - (9) 



with 



i^\\y) = Tfr I x{t)y{t)dt. 



(10) 
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Fa and Ff, are integrals defined as 

r— 



and 



-To 



b{t)x{t)e-"^^'^dt. 



(11) 



(12) 



We define a new time variable called ti, as follows: 

tb=t + Ua. (13) 

Taking a derivative with respect to t on both sides of 
Eq. (dSl), we get 



A. Time Domain Analysis 

The J^-statistic can be calculated from a time series di- 
rectly by following the steps outlined in Section [TTl How- 
ever, due to the large amounts of data involved, it is im- 
practical to do this for the entire data set. One way to ad- 
dress this problem is to divide the data into band-limited 
time series, making it possible to analyze one small sub- 
band at a time. Time series spanning different frequency 
bands are then analyzed in parallel on a Beowulf cluster 
or a distributed computing system. In this section we 
provide details on how this is accomplished in the time 
domain, and address some of the difficulties that arise. 



dtb _ ^ dtm 
Itt ^ ~dF 
From Eqs. © and (HH), we get 



no ■ Vrf(t) 



(14) 



(15) 



where 'Vdit) is the velocity of the detector in the SSB 
frame and thus is the Doppler shift of the source 

with respect to the detector. For a detector located on 
Earth, the maximum Doppler shift experienced is of the 
order of 10"''. Using this fact and equation [l3] we get 
dtb « St. 

We can thus rewrite the Eqs. for Fa and Ff, as 

r— 

Fail) = / ' a(4)x(4)e-2../t,g.$4*.)d4, (16) 



and 



To 



b{tb)x{h)er^^'f''e'^'^''Utb (17) 



which are just the Fourier transforms of the resampled 
data and the detector response, multiplied by a phase 
g»*,(t6) Eqs. (HSl) and ^ can be efficiently eval- 

uated using FFTs. Details of the resampling procedure 
can be found in Sec. IIII A21 



III. IMPLEMENTATION OF BARYCENTRIC 
RESAMPLING 

Gravitational wave detectors collect data at the rate of 
about 16-20 kHz for spans of time on the order of a year. 
This means that typical searches for gravitational waves 
will involve on the order of a terabyte (TB) of data. Com- 
puters currently have memories of a few gigabytes (GB), 
making it necessary to break up the data into pieces that 
can fit in the memory of a single computer. To analyze 
the full data set hundreds to thousands of these comput- 
ers can then be used together in the form of a Beowulf 
cluster, or tens to hundreds of thousands with distributed 
computing systems such as EinsteinQHome [22j|. 



1. Heterodyning, low-pass filtering, and downsampling 

Let the output of the instrument be the time series 
x{t), and its Fourier transform be 



i(/) 



x{t)e'^'''f'dt. 



(18) 



If we consider the Fourier transform of the complex time 
series Xh{t) = x(t)e'^'"''^''* , 



Xhif) = I x{t)e''"f>^'e-^'''^*dt 

) 

x{t) ■ e-^'^'^^-f'^^'dt 



(19) 



it is obvious that multiplying the time series x{t) by 
g'^TTifht j^g^g giiif^efi all iiyQ frequencies in the time series 
x(t) by fh- This procedure is referred to as complex het- 
erodyning. 

If just a small frequency band B of data around fh is of 
interest, low-pass filtering followed by downsampling can 
be used to reduce the bandwidth of the data appropri- 
ately. Specifically, if we wish to downsample by a factor 
D, the new Nyquist frequency of our time series will be 
given by 



/Nyq,i 



/Nyq,old 

D 



B 
2"' 



(20) 



A simple but effective downsampling technique involves 
picking every D^^ point in the time series. To avoid alias- 
ing effects however, prior to downsampling a low pass 
filter must be applied to the data with a sharp fall-off 
around the new Nyquist frequency. The heterodyned, 
band-limited, downsampled complex time series will have 
a sampling time At — 1/B. For example, suppose we are 
only interested in analyzing data between 990 Hz and 
1 kHz. By multiplying the data with the phase factor 
g27r(995)»t^ data at 995 Hz moves to Hz (DC), 990 Hz 
moves to -5 Hz, and 1 kHz to +5 Hz (we have taken t 
to be measured in seconds). To avoid aliasing problems 
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when we downsample, we low-pass filter the data at 5Hz, 
the new Nyquist frequency. We can then downsample by 
picking one point out of every 100. The resulting com- 
plex time series will be sampled at 10 Hz and contain all 
the information in the original time series between 990 
Hz and 1 kHz. 



2. Barycentric resampling and heterodyne correction 

In this section we explain how to use the low band- 
width heterodyned complex time series to compute the 
J^-statistic given by Eq. ([8|). 

In the following we will work only with Fa- The proce- 
dure for _FJ, is completely analogous. It is easiest to begin 
with the integral definition for Fa in Eq. ((TT|l with the 
phase explicitly written out, namely, 



Fc 



a{f) = Jl a(i)x(i)e-2"'-^(*+*'")e'*=(*)di, (21) 



and a similar expression holds for Fi,. The heterodyned 
version of Fa is 

Faif - fh) = / a(i)x(i)e-2"'(^-^'-»*+*'")e'*=(*)di. 

(22) 

If we already have a complex heterodyned time series 
Xh{t) (heterodyned in the detector frame), we can use it 
to absorb some (but not all) of the heterodyne exponent 
in Eq. p2|) as follows: 



x{t)e 



-27rj(/-/fc)(i+i,„) _ 



Xh{t)e 



27rj/ht„g-27ri/(t+t„)_ 



(23) 



This means that rather than Eq. (|22|) . we should evaluate 



Faif -h)^ I a(i)z(t)e-2-^'(*+*")e'*=(*)di, (24) 



where 



(25) 



At this point we have an expression which looks like 
Eqs. ([TT]) and ([T2|) . and we can write the integral over 
t instead as an integral over ti,: 

r — 

Fa{f~h)^ I a(4)z(4)e-2-^*^e'*=(*'')d4, (26) 
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FIG. 1: Graphical description of the resampling procedure 



and a similar expression holds for Fi,: 



N 

Ftif - h) = E &(^')^(i')e"'"^*'e^*»(*^)d4, (28) 

k=l 



where is the fc"^ datum in the time series as measured 



in the barycentric frame and dti, = t^'^ — t^. The rela- 
tionship between tb and t can be written as 

t'^ =t'' +t^{t'';a,S). (29) 

This relationship between t'^ and t'^ can be used to cal- 
culate z{t^) from the time series z{t''). In practice, one 
starts out with z{t'^), i.e. data sampled regularly in the 
detector frame. Then we calculate T^{t^), which are de- 
tector times corresponding to regularly spaced samples 
in the barycentric frame. These T^{t^) are irregularly 
sampled in the detector frame, but since we have z{t^), 
we can calculate z{T''{t^)) by using interpolation. The 
interpolated time series z{T''{t^)) is the z{t^) of Eqs. ([77|) 
and (12811 . A similar procedure may be used to calculate 



the a(t^) from a{t''), and the b{t^) from b{t''). The factor 
of e'*-(*'>) in Eqs. ^ and 1^ is calculated usmg equa- 
tion ([5]). In this case, instead of calculating $s(t'^), we 
calculate <i>s(T*''(<^)), which is equivalent to calculating 
^s{tb)- While in theory one has to calculate the quantity 
1^0 ■ ^d{t) in equation in practice this information is 
already encoded in T''{t'^) as 



no-rrf(t)=t„.c=(t^-r'=(i,^)) 



(30) 



with a similar expression for Fj,. 

The discrete version of Eq. (1^51) for a time series with 
N points reads 



N 



Faif - fk) = Y.aitt)zitt)e-'-'f'^e^^('^Uu, (27) 



k=l 



With all the parts of Eqs. ([27)) and (1281) hand, we can 
compute Faif - fh) and Ftif - fh)- 

In summary, the procedure is the following: 

1. Start with a heterodyned, band- limited, downsam- 
pled Xhit^) with regularly spaced in time, in the 
frame of reference of the detector. 
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2. Correct the Xh{t^) for the heterodyning done in the 
detector frame by multiplying with e^'^'^i^tm to pro- 
duce the z{t'^). 

3. The z{t^) correspond to data irregularly spaced in 
the barycentric frame. Calculate T''{t'^), which are 
times in the detector frame corresponding to regu- 
larly sampled solar system bary center times. 

4. Using interpolation, calculate z{T''{t'l^)) from z{t^), 
which is the z{tl) used in Eqs. ^ and 

5. Similarly, from a{t'^) and b{t^) calculate a{t^) and 

respectively. 

6. Using FFTs, evaluate Eqs. (HH) and ^ to calcu- 
late - /„) and^^,(/-/,0. 

7. Use Eq. ([5]) to calculate the J^-statistic. 



1. Dealing with non- stationary and colored data 

To deal with non-stationarities, variations in the noise 
floor from SET to SET, and colored data, we can nor- 
malize our SET data to absorb the 1/Sh{f) term in the 
definition of the 7^-statistic in Eq. ([8]). If Xa,k is the 
^th frequency bin of the a^^ SET, then we can redefine a 
normalized data point Xa^k as 

Xa,k > = , ' , (31) 

Y ^a,k 

where Sa,k is an estimate of the one-sided power spec- 
tral density for the k^^ frequency bin of the a**^ SET. 
Estimators used for this purpose should be robust in the 
presence of spectral features in the data, such as a run- 
ning median. 

2. Merging SFTs into long time-baselme Fourier 

transforms 



B. Frequency Domain Analysis 

In the previous section we describe a practical way of 
calculating the J^-statistic from time series data. How- 
ever, in practice the calculation is done in the frequency 
domain for a couple of reasons. One is that much of 
the code written in the LIGO Scientific Collaboration's 
(LSC) Continuous Waves working group is tailored to an 
analysis performed in the frequency domain and hence 
there exist many data processing and validation tools to 
process the data that are useful to this code. Another 
reason is that gravitational wave detectors are subject to 
many sources of noise, some of which change daily or even 
hourly, such as wind, microseism, earthquakes, anthro- 
pogenic noise, etc. These change the noise floor of any 
analysis as a function of time. Working in the frequency 
domain is a natural way to deal with this problem. 

We begin a frequency domain analysis by taking short 
time-baseline Eourier transforms of the time domain 
data, called short Eourier transforms (SETs). When we 
calculate the J"-statistic, we divide by the noise in the 
instrument at that frequency, as shown in Eq. ^ . How- 
ever, Eq. ([5]) assumes the noise is stationary. To account 
for the non-stationarity of the noise we need to weight 
by the noise over time, which is done on a per SET ba- 
sis. This normalization process is described in the next 
section. 

The computational cost of estimating the noise per 
SET scales with the number of SETs and thus for a fixed 
observation time scales inversely with the time-baseline. 
A compromise is needed between the demands of compu- 
tational time and relative stationarity of the detector for 
a given time-baseline. In LIGO, SETs are usually 1800 
seconds long, since the detector is reasonably stationary 
for that time. 



There are many practical difficulties that arise when 
dealing with SETs. Often contiguous chunks of data have 
to be divided up into multiple SETs and it is necessary 
to coherently combine them into one long time-baseline 
SET. This is done using the Dirichlet kernel, which is 
the equivalent of a sine interpolation (ideal interpolation) 
done in the time domain. In order to keep the compu- 
tational cost down, the Dirichlet kernel is truncated at 
a finite number of points (usually around 16). This in- 
troduces a slight interpolation error, which cannot be 
avoided without sacrificing a large amount of computa- 
tional power. 

Suppose we divide the data x{t) of length Tq into M 
short chunks of length Tsft each with N points, so that 
To = MTsFT- The discrete Eourier transform (DET) of 
the data is 

NM-l 

Xt= J2 2^(6-2-^'^/^^^ (32) 

1=0 

where xi = x{lAt), At is the sampling time, and b is 
a long time-baseline frequency index. We can write the 
Eourier transform in terms of two sums: 

M-l N-1 
a=0 j=0 

where Xaj = x{{j -\- Na)At). We can express the 
terms of an inverse DET of a short chunk of data, 

fc=0 

where the Xa,k are the starting SET data, 

N-l 

Xa,k — ^ ^ ■^a.j 6 ^ . (35) 

3=0 



6 



Replacing Xa,j with Eq. (p4|) in Eq. (|33|) gives 



Q=0 i=0 \ A;=0 / 



\ E E E ^ 

fc=0 j=0 



Q=0 



(36) 



The last sum in this expression can be evaluated analyt- 
ically. In particular, 



N-l 

E 

J=0 



1 - Z 



Nc 



(37) 



We take z = e, c = —iy/N, with y = 27r(fe/Af — fc), so 
that the sum is given by 

N-l 



E^ 

3=0 



1 — p~^V 

-iyj/N _ ^ (no\ 



In the large iV limit the exponent of the denominator will 
be small so that 



1 - e-'y 



1 - e-'y 



iN , 



1) 



1 _ p-iy/N 1 - (1 - iy/N) y ^ 

, , , sin y 1 — cos y , 
= N{ --i ^ . 

y y 

This means we can write Eq. (1361) as 

a=0 fe=0 

with the Dirichlet kernel 

sin y 1 — cos y 

-Pb.fc = « , 

y V 

and y = 27r(6/M — fc). The function Pf, fc is very strongly 
peaked around y — which is near a value of the fre- 
quency index k* = floor(6/M). This means one only 
needs to evaluate the sum over k for a few terms A/c 
around k* . With this in mind we write 



(39) 



(40) 



(41) 



M-l 



k'+Ak 



e 

a=0 k=k*-Ak 



E X^,kPa,k. (42) 



To produce a heterodyned time series a sub-band of the 
Xb may be selected and inverse Fourier transformed. 



3. Normalized long time-baseline Fourier transforms 



and take a sub-band of Xb, inverse Fourier transform it, 
and produce the heterodyned time series, and correct it 

4) 



to produce z{tb)- In terms of this time series, we can 



write 



N 



(44) 



and 



k=l 



N 



Fbif - A) - E z{tt)bitt)e-''''^'^e^'''('^^ , (45) 



fc=i 



and thus 



4 B\Fa\^ + A\Fb\^ ~ 2C^{FaF;' 
To D 



(46) 



4- Heterodyning 

As shown before in Eqs. (IT^ and (ITOl) . heterodyning 
is a procedure by which the frequency of interest can be 
shifted arbitrarily. When one applies the kind of correc- 
tion in Eq. p^ . we effectively move all the frequencies 
by a set amount. By doing so, we convert the time series 
from a real time series to a complex time series, with the 
same amount of information content. 

Heterodyning in the frequency domain can be done in 
two ways, one in which the time series produced after in- 
verse Fourier transforming is real and another in which it 
is complex. A cosine transform used to heterodyne would 
produce a real time series, but this method is not used in 
an implementation of the technique (see section ITV)) . A 
complex heterodyned time series is produced by inverse 
Fourier transforming a relabelled band of the frequen- 
cies. Since in Eq. p^ . all frequencies are shifted by a 
fixed amount, the equivalent procedure in the frequency 
domain is just relabelling the heterodyne frequency as 
DC and subsequently all the other frequencies relative to 
this new DC. 

Taking the example from Section IIII A 11 we can just 
internally change the labels of the 995 Hz frequency bin 
to DC and 1000 Hz to 5 Hz. Once this relabelling is 
done, the original data will have all shifted by 995 Hz, 
with the 10 Hz from -5 Hz to -|-5 Hz containing all the 
relevant information. If one were using the whole band 
without downsampling or filtering, then this relabelling 
would have to wrap around the Nyquist frequency edge, 
but since the whole purpose of heterodyning is to down- 
sample, it is never necessary to do so. 



With the normahzcd SET data Xa,k fi'om Eq. ^ 
we can construct a normalized version of the long time- 
baseline Fourier transform 



Xb 



M~l 

E 

a=0 



-2iTiba/M 



k'+Ak 

^ ^ Xot , k Pa , k J 

k=k* -Ak 



(43) 



5. Downsampling and low-pass filtering 

Following the time domain algorithm, after hetero- 
dyning the data, it needs to be downsampled and low- 
pass filtered. The downsampling and low-pass filtering is 



achieved by simply throwing out the data that is not in 
the band of interest. The heterodyning is done in such a 
way as to keep the center of the band of interest at DC. A 
Tukey window apphed to the band of interest, keeping a 
httle bit of data on both edges to facilitate the rise of the 
window from to 1, is a good choice of a low-pass filter. 
Once an inverse Fourier transform is performed on this 
smaller subset in the frequency domain, it generates the 
same heterodyned, downsampled, and low-pass filtered 
time series as the time domain algorithm. 
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6. Gaps in the data 



FIG. 2: Pictorial description of data pre-processing 



Data collected by an interferometer will have gaps due 
to periods of downtime. These gaps need to be dealt 
with in a manner that preserves the phase coherence of 
the segments around the gaps. The gaps increase the 
analysis time without contributing any power to the J-- 
statistic, and thus act like a zero padding. 

The data is divided up into a series of contiguous 
chunks and gaps. For each contiguous chunk the SFTs in 
that chunk are normalized, patched up and then a het- 
erodyned, downsampled and low-pass-filtered time series 
is calculated from it. Heterodyning done by relabelling 
is equivalent to multiplying with e^'^*-^'* where tc 
is the start time of the data chunk being heterodyned 
and fh is the heterodyne frequency. If we have multiple 
chunks that are separately being heterodyned, then tc is 
different for each chunk. In the time domain analysis, we 
assumed that the heterodyne reference time is the same 
as the start time of the analysis. In order to achieve the 
same kind of heterodyning, one needs to multiply each 
newly created time series with a correcting phase factor, 
namely 

where is the start time of the overall analysis. 

A Tukey window can then be applied to each of these 
time series to smoothly bring the data to zero at the 
edges, which correspond to the gaps. The gaps are then 
filled with zeros, as no data was collected during those 
times. This procedure is repeated for all the gaps and 
contiguous chunks. At the end, a time series is produced, 
which is contiguous and spans the time of the analysis. 
By ensuring that the timestamps of the first datum of 
each contiguous chunk correspond with the start time of 
that chunk, we ensure that the phase coherence is main- 
tained throughtout. 

7. Summary 

To summarize, a simple algorithm to produce a time 
series equivalent to the one used for the time domain 
analysis is as follows: 

1 . Divide the data into time chunks and Fourier trans- 
form them to create SFTs. 



2. Normalize these SFTs and assign them weights. 

3. Identify contiguous sets of SFTs. 

4. Combine each contiguous chunk of SFTs into one 
long time-baseline Fourier transform (FT). 

5. Create a downsampled, heterodyned, and low-pass- 
filtered time series by inverse Fourier transforming 
the desired frequencies from the FT. 

6. Stitch all these time domain chunks together, filling 
gaps with zeros. 



IV. RESULTS 
A. Speed 

The scheme previously used to compute the T- 
statistic, involved the use of the Dirichlet kernel to com- 
bine a series of SFTs [sij [13], which were calcu- 
lated for 30 minutes of data taken at 16 kHz. The 
30 minute window was set by the maximum Doppler 
shift due to the motion of the Earth. A C code 
called ComputeFStatistic_v2 [2^ was written in the 
LIGO Analysis Library (LAL) to calculate the J-- 
statistic using this algorithm. The code which imple- 
ments our method is also written in C and is called 
ComputeFStatistic_resamp [s^l • Henceforth we will refer 
to the previous implementation as the LAL implementa- 
tion and our implementation as Resampling. 

The J^-statistic is calculated for a series of templates 
looping over various parameters such as sky location, a 
and S, spin-downs J*^, and various frequencies /. We can 
ignore the way the two implementations deal with loops 
over a, S, and Z*^, since they both loop over them in the 
same manner. The speed of computation for a loop over 
frequencies / is worth comparing, however. 

Assume that we have N data points (take for example 
10^ seconds of data at 100 Hz, i.e. 10® data points). Now 
assume that the number of operations per sky location 
and per spin-down is Nops- If the number of Dirichlet 
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kernel points used is -A^oiLKcr, then the total number of 
operations used by the LAL implementation is: 



^Tot = ^ops • iVDir.Kcr ' A^SFTs ' N , (48) 

Where iVops is defined as the number of operations con- 
ducted in the innermost loop and is approximately of 
order 10, iVDii_Kcr is the number of times the Dirichlet 
Kernel loop is repeated, A^sfts — TsIt number of 

SFTs, and A^ is the number of data points. 

Compare this to the resampling method, which con- 
sists of 4 major steps: 

1. Calculating tb{t), given a sky location and time. 

2. Calculating the integrands of Fa and Ff,. 

3. Interpolating and calculating the beam patterns. 

4. Taking the Fourier transform. 

Each of these steps involves order 10 operations, but 
all of these steps are sequential, therefore they only 
add, resulting in a total number of operations per data 
point, A"^'f™P, of approximately 30 operations. The last 
step is the Fourier transform, which is of order NlogN^ 
therefore the total number of steps is: 

^R^samp ^ (TvR™ + log AT) . at . (49) 

Therefore the ratio of operations between the two meth- 
ods is 

A^Tot^ _ ^ops • A^Dii_Kcr ' A^SFTs 
^Rcsamp - N^^i^"'" + \og N ^ ' 

To first order, we have 

^Rosamp log iV ' ^ ' 

Therefore for large observation times, this method of 
calculating the J^-Statistic is faster and, in the case of a 
targeted search, it allows for a large parameter space in 

The speed-up in practice is reduced by a few practi- 
cal issues as seen in section |Vl However, Resampling 
is still considerably more efficient than the LAL imple- 
mentation. For Einstein@Home, because of the relatively 
small coherent integration time, the speed-up is around 
10. But for targeted searches that span multiple months 
or years, the improvement can be as high as a factor of 
2000. Thus, while some targeted searches which integrate 
over a couple of years were impossible to do previously, 
they are now possible. 



Histogram of F-Stat values at injected frequecy 
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FIG. 3: Histogram of results of Monte Carlo simulation with 
signals injected in different instances of noise 



B. Validations 



The probability density distribution of the J^-statistic 
for Gaussian noise of zero mean and unity standard de- 
viation is a distribution with four degrees of freedom. 
In the presence of a signal, the distribution is a of 
four degrees of freedom with a non-centrality parameter 
given by the J-'-statistic in the absence of noise for the 
particular signal. 

Resampling uses various approximate methods in the 
calculation of the J-"-statistic, and this can lead to dis- 
agreements between the theoretical J^-statistic probabil- 
ity density function and the output of the code. These 
changes are of the order of a few percent and are within 
acceptable limits. The validity of the code can be tested 
by using a Monte Carlo simulation of about a million dif- 
ferent injections of the same signal in different instances 
of noise. The noise is generated as a Gaussian noise of 
zero mean and unity standard deviation, and the signal 
is added into this noise. For each individual injection 
the signal is chosen with a given set of amplitude pa- 
rameters and a fixed sky location and spindowns, and 
the search is conducted over these exact chosen param- 
eters in order to avoid any mismatches. These Monte 
Carlos are then repeated with another set of parameters, 
which are themselves chosen randomly. While it is not 
an exhaustive test, randomly chosen parameters ensure 
that we are not biased in the validation test. The plot 
in figure [3] is produced by performing one such Monte 
Carlo simulation. In this case, both the LAL implemen- 
tation and Resampling were run on the same set of data. 
The J-'-statistic was picked out at the appropriate fre- 
quency and this was repeated about a million times. A 
histogram of these J^-statistic values was then plotted. 
As one can see, there is very good agreement in between 
the expected distribution of the J-'-statistic and the two 
implementations. 



9 



V. PRACTICAL CONSIDERATIONS 
A. Discreteness 

In the implementation of the algorithm explained 
above, one major obstacle is the fact that the data col- 
lected by any physical instrument is discrete and thus 
must be handled appropriately. Take, for example, the 
heterodyne frequency used in the calculation. This fre- 
quency cannot be chosen arbitrarily, as only certain fre- 
quencies are sampled and thus there are only certain per- 
mitted choices. 

Most major FFT computation algorithms output the 
frequency series in a specific format, which split the data 
into two parts. The first bin output by these algorithms 
is the DC followed by the first positive frequency bin 
up to positive Nyquist and then follows this up with 
the negative frequencies starting at the negative Nyquist 
frequency. This order of placing frequency bins speeds 
up computation and is necessary for the internal work- 
ings of these algorithms. Thus when an inverse FFT is 
performed on the frequency domain data in the form of 
SFTs, a simple reshuffling needs to be done. The fre- 
quency selected to be the first bin will become the new 
DC and thus the data will have been heterodyned by 
that said frequency. In order to ensure that the same 
frequency bin is chosen as DC, one needs an odd num- 
ber of bins per SFT. If the number of bins are even, then 
upon increasing the amount of data it can shift this num- 
ber to an odd number as the increase is always done by 
changing the number of SFTs. But if the number of bins 
per SFT is odd, then it will remain odd for any number of 
SFTs. This ensures that there is no mismatch in choosing 
the appropriate bin as the heterodyne frequency. 



to discard the higher frequencies. 

VI. SUMMARY AND CONCLUSIONS 

In this paper, we describe an efficient implementation 
of the bary centric resampling technique, which deals with 
the non-stationarity of the detector and calculates the 
J^-statistic. Although the calculation of the J^-statistic 
has been targeted, this technique can be used for many 
other kinds of searches. The major contribution of this 
technique is to remove the Doppler shift of the Earth's 
motion in a gravitational wave signal. Thus, once this 
Doppler shift is removed, both frequentist and Bayesian 
techniques can be applied to the data. In the process of 
implementing this algorithm, a series of practical issues 
are dealt with, including constraints of modern computer 
memory, discreteness of the data taken, losses due to in- 
terpolation, and gaps in real data. 

The computational savings due to this technique can 
be used in various ways. One such use is to increase 
the coherent integration time for all-sky searches like the 
Einstein@Home searches. Currently Einstein@IIome 22 1 
uses 40 hour long coherent integration time. The resam- 
pling code will be about 10 times faster for such integra- 
tion times, and for the same computational power and 
keeping the same scaling for the search, we can coher- 
ently integrate 64 hours instead, which corresponds to a 
sensitivity increase of about 25%. 

The resampling technique is most effective for long in- 
tegration times, which are feasible for targeted searches 
like the search for gravitational waves from the Crab pul- 
sar ^21j] . The computational savings can be used to search 
over wider parameter spaces like more spindown param- 
eters or to search over binary systems. 



B. Interpolation Issue 
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When the resampling algorithm is used on discrete 
data, one needs to interpolate between data points to 
go from the detector frame to the SSB frame. This inter- 
polation acts like a nonlinear low-pass filter and destroys 
the power at higher frequencies in the band of analysis. 
Since the filter is nonlinear, the frequency response is not 
well-defined, and thus there is no way to compensate for 
the power loss at high frequencies. The power loss can 
be significant (of order 30%) and is unacceptable in most 
analyses. The exact nature of the filter depends on the 
type of the interpolation routine used and the sky loca- 
tion that one resamples to. The only work around is to 
perform the computation over a larger band than the one 
desired. In practice it is sufficient to double the band and 
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